Design and Optimization of a Speech Recognition Front-End for Distant-Talking Control of a Music Playback Device
نویسندگان
چکیده
This paper addresses the challenging scenario for the distanttalking control of a music playback device, a common portable speaker with four small loudspeakers in close proximity to one microphone. The user controls the device through voice, where the speech-to-music ratio can be as low as −30 dB during music playback. We propose a speech enhancement front-end that relies on known robust methods for echo cancellation, doubletalk detection, and noise suppression, as well as a novel adaptive quasi-binary mask that is well suited for speech recognition. The optimization of the system is then formulated as a large scale nonlinear programming problem where the recognition rate is maximized and the optimal values for the system parameters are found through a genetic algorithm. We validate our methodology by testing over the TIMIT database for different music playback levels and noise types. Finally, we show that the proposed front-end allows a natural interaction with the device for limited-vocabulary voice commands.
منابع مشابه
Front-end processing of a distant-talking speech interface for control of an interactive TV system
متن کامل
Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter'
This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. We propose two different speech-recognition interfaces for MIR, speech completion and speech spotter, and describe two MIR-based hands-free jukebo...
متن کاملDeep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a comb...
متن کاملA Stereophonic Acoustic Front-End for Distant-Talking Interfaces based on Blind Source Separation
In this contribution, an acoustic front-end for distanttalking interfaces that only requires two microphone signals is presented. It comprises a directional blind source separation (BSS)-based noise and interference estimation scheme and Wiener-type filters for noise and interference suppression. The proposed front-end and its integration into a speech recognition system is analyzed and evaluat...
متن کاملSuitable Design of Adaptive Beamform Spectrum for Noisy Speec
Recognition of distant-talking speech is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (Adaptive Microphone-array for NOise Reduction) was propos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1405.1379 شماره
صفحات -
تاریخ انتشار 2014